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Abstract 

The multiresolution diffusion entropy analysis is used to evaluate the stochastic information left 
in a time series after systematic removal of certain non-stationarities. This method allows us to 
establish whether the identified patterns are sufficient to capture all relevant information contained 
in a time series. If they do not, the method suggests the need for further interpretation to explain 
the residual memory in the signal. We apply the multiresolution diffusion entropy analysis to the 
daily count of births to teens in Texas from 1964 through 2000 because it is a typical example of a 
non-stationary time series, having an anomalous trend, an annual variation, as well as short time 
fluctuations. The analysis is repeated for the three main racial/ethnic groups in Texas (White, 
Hispanic and African American), as well as, to married and unmarried teens during the years from 
1994 to 2000 and we study the differences that emerge among the groups. 



1 



I. INTRODUCTION 



Time series analysis is traditionally done using linear models, such as analysis of vari- 
ance and linear regression models. Underlying these techniques is the assumption that the 
phenomena of interest can be described by a few basic standard patterns. However, the vari- 
ability of the statistical properties of a time series, such as the number of births to married 
and unmarried teens shown in Figure 1, can present more complex patterns than a periodic 
cycle or a linear trend, so certain precautions must be adopted for not neglecting less evi- 
dent but still important properties of a time series. In fact, in the case of teen birth data 
herein under study it was proved that after removing the background trend from the annual 
periodicity, the memory remaining depends upon the marital status of the teens ^,|^. This 
fact suggests that the annual periodicity to which human fertility is related may have social 
origins more than being simply due to a natural seasonal variation and the corresponding 
changes in light and temperature The variability of the data might depend on political 

and social changes, as well as depending on holidays and school schedules that may induce 
unusual patterns in the data. 

Herein we suggest a multiresolution diffusion entropy analysis in order to evaluate the 
stochastic information left in a time series after a systematic removal, through a detrending 
procedure, of the memory contribution given by identified patterns. The goal is to study a 
phenomenon by identifying possible patterns, quantify the amount of information with which 
each of the patterns contributes to the data and whether the identified patterns are sufficient 
to capture all relevant information contained in the time series or whether new patterns 
need to be found. For didactical purposes we limit the analysis presented herein to verifying 
whether simple additive patterns such as linear trends and periodic cycles are sufficient to 
describe the complexity of the teen birth data. We adopt a basic regression model because of 
its simple phenomenological interpretations, but we expect that the multiresolution diffusion 
entropy analysis will also work with more complex detrending models. 

II. BIRTHS TO TEENAGERS IN TEXAS (1994-2000) 

Figure 1 shows the daily number of births to married and unmarried teen mothers in 
Texas from 1994 through 2000, 2,557 days. During these 7 years, both counts of births are 



2 



characterized by annual cycles that fluctuate around an apparent linear bias. In the case 
of unmarried teens this bias increases, whereas the bias decreases in the case of married 
teens. The characteristics of these data suggest that the simplest patterns can be captured 

by a least-squares fitting function that involves a linear trend plus two sinusoidal functions 
modulated by 1-year and 1/2-year periodicities, respectively: 

f{t) = a + h{t- 1994) + Ci sin[27r(t - n)] + C1/2 sin[47r(t - ri/2)] . (1) 

The coefficient a gives the approximate mean value of daily births at the beginning of 1994, 
h is the slope of the linear trend that gives the annual mean increase (positive value) or 
decrease (negative value) of daily births per year, Ci and C1/2 are the amplitudes of the 1 
year and 1/2 year periodicities and, finally, ti and T1/2 measure the temporal shift of the 
two sinusoidal functions. The time t is measured in years. The analysis shows that during 
the period 1994-2000 the 99 daily mean births to unmarried teens is almost double the daily 
mean births to married teens, that being only 50 daily mean births. The number of births to 
unmarried teens increases at the velocity of 6 = 1.92 daily births/year, whereas, the number 
of births to married teens decreases at the velocity of 6 = —0.62 daily births/year. Finally, 
the two smooth solid curves of Figure 1 clearly shows that the 1/2- year harmonic has a 
much more prominent influence on the unmarried than it does on the married teens. 

Table I records the fltting parameter values by repeating the same analysis for all 
racial/ethnic (White, Hispanic, African American) and marital teenagers groups in Texas 
during the years from 1994 to 2000. Among the three racial/ethnic groups, the mean number 
of daily births to Hispanic teen mothers is the highest with a mean of 80 births/day. The 
White group is second with a mean of 44 births/day, and the African American is last with 
a mean of 25 births/day. The comparison of the count of births between ethnic groups show 
that the births to both White and African American teens decreases, whereas the count of 
births to Hispanic teens decreases from 1994 until 1996 and increases from 1997 to 1999. 
Moreover, for all racial/ethnic groups the mean number of births to unmarried teens is almost 
double that for the married teens. For Whites and Hispanics the ratio between the births 
to unmarried and married young women is almost the same (27/17 = 1.59 for White and 
49/31 = 1.58 for Hispanic). By contrast, for African American teens the mean is 23 births 
to unmarried mothers versus only 2 births to married mothers. The ratio is 23/2 = 11.5 and 
is almost seven times higher than that for the other two racial/ethnic groups. Therefore, 
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young pregnant African American teens are much less likely to be married than are their 
White and Hispanic counterparts. 

The detailed numerical analyses of the least-squares fitting curves stress further differ- 
ences among the groups. The number of births to unmarried Hispanic teens grows with a 
velocity of 6 = 2.06 daily births/year. In contrast, there is only a slight increase of births to 
unmarried White teens {b = 0.35), and a slight decrease in the number of births to unmarried 
African American teens {b = —0.50). Within the married group, the number of births to 
Hispanic teens is increasing slightly (6 = 0.07), the other two groups are decreasing slightly 
(whites, b = —0.65; African Americans, b = —0.04). For all three racial/ethnic groups the 
unmarried teen's behavior is always influenced by the 1/2- year periodicity. The strength of 
the 1-year and 1/2-year periodicities are estimated by the intensity of the amplitudes Ci and 
Ci/2- For the unmarried groups the two amplitudes are quite similar, whereas, for married 
groups the annual amplitude Ci is always much stronger than the 1/2-annual amplitude 
Ci/2- We observe that the annual cycle is likely related to the one year seasonal tempera- 
ture/light cycle. The 1/2- year periodicity is likely to be related to the autumn and spring 
school semesters that, together with the Christmas and Summer holidays, divide the year 
into two almost symmetric temporal periods. The data present also a weekly periodicity 
due to the separation of the week in working days (from Mondays to Friday) and weekends 
(Saturdays and Sundays) when the hospital activity is reduced. 



III. MULTIRESOLUTION DIFFUSION ENTROPY ANALYSIS AND TIME SE- 
RIES PATTERNS 

Time series fluctuations, commonly called noise, look random and uncorrelated, but 
usually they are not. Different methods have been suggested to extract information from 
the randomness of a time series; for example, autocorrelation analysis. Hurst analysis and 

been used to determine the interdependence 
of random fluctuations. The success of these techniques over the last few years is due to the 
discovery that many natural phenomena described by correlated fluctuations satisfy certain 
scaling laws jalsllsl- A common feature of the above techniques is they study the temporal 
evolution and the scaling of the variance. However, the entropy is a more complete indicator 
of the stochastic information contained in a distribution and herein it shall be prefered. 
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Diffusion entropy analysis (DEA) measures the time evolution of the Shannon entropy 



of the probabi 



time series 



ity de nsity function (pdf) in terms of the diffusion process generated by the 
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A time series of data {^j} may be interpreted as the fluctuations 
in a diffusion process. As in a random walk, we define trajectories by the superposition of 
these fluctuations 



X 



t 



(t)=E6+. , (2) 



i=l 



where z = 0, 1, ... . These trajectories generate a diffusion-like process that is described by 
a pdf p{x,t), where x denotes the variable collecting the fluctuations and t is the diffusion 
time. The Shannon entropy is defined by 

/ + 00 
dx p{x,t)ln[p{x,t)] . (3) 
-oo 

DEA measures the stochastic information contained in the variability of the smooth functions 
at different time scales. In fact, the diffusion trajectories x^^\t) of Eq. (0) generate a pdf 
p{x, t) whose dispersion at each time t is related mainly to the smooth component of the 
signal characterized by the lower frequencies f < 1/t because the frequencies f > 1/t are 
smoothed over by the sum 

A constant signal, equivalent to a horizontal straight line, does not have any variability, 
therefore, its entropy is zero. In the presence of a correlated noise, like the fractional 
Brownian motion (fGm) the pdf of the diffusion process satisfies the scaling equation 

PM = hF(^). (4) 



Inserting Eq. (jD) into Eq. 0, we obtain 

S{t) = 6 ln(t) +A , (5) 

where A is a constant. The coefficient 6 of Eq. is the scaling exponent. The exponent 6 
can be evaluated by using fitting curves with function of the form fs(t) = 6\n{t) + K that, 
when plotted on linear-log graph paper, yields straight lines. For fBm the scaling exponent 



6 coincides with the Hurst exponent H 



For random noise with finite variance, the 



diffusion distribution p{x,t) will converge, according to the central limit theorem to a 
Gaussian distribution with 6 = H = 0.5. A correlated or persistent noise (=tendency to 
conserve the direction of the motion) would have 6 > 0.5. An anticorrelated or antipersistent 
noise (tendency to invert the direction of the motion) would have 6 < 0.5. 



The uppermost curves of Figure 2 show the DEA apphed to the number of births to 
married and unmarried teens from 1994 to 2000. The pdf p{x,t) is estimated through 
histograms with bin widths equal to the standard deviation of the analyzed datasets. This 
has the effect of normalizing the results. These curves do not show a real scaling behavior 
like Eq. (0) but patterns showing a short oscillation related to the weekly periodicity and a 
longer oscillation of one year. The entropy grows and suddenly decreases, reaches a minimum 
at one year and then increases again. The dynamical reason for this behavior is accounted 
for by noticing that a given periodicity of the data causes a periodic convergence of distinct 
trajectories After an initial spreading, with a consequent increasing of both variance 
and entropy, at the end of the oscillation there is incomplete regression back to the initial 
condition. 

However, Figure 2 reveals much more of the time series using a multiresolution diffusion 
entropy analysis Q]. The idea is to extract from the data the information that may be 
related to some noticeable patterns in the data. In the teen birth case, a good choice is 
given by the simple regression model of Eq. (^, that provides information about the linear 
growth and the two main periodicities in the time series. We detrend this information 
from the data and apply the DEA to the resulting data set. In this way we determine the 
stochastic information remaining in the detrended time series. 

The ideal goal would be to find the solution of the underlying process, that is, an analytic 
function that makes the entropy of the system vanish after the detrending procedure. A 
more modest goal is to extract all relevant information leaving only random noise. So, we 
first apply the DEA to the birth data after detrending the linear ramp y = a + bt. The second 
uppermost curves in Figure 2 show the results. We see a decrease in the entropy because 
we have removed part of the information from the time series, but the complex bending 
shape with the 1-year periodicity remains. The decrease in the entropy is stronger in the 
unmarried than in the married teens because their velocity of growth, b = 1.94 births/year, 
is larger. The second step is to detrend the annual periodicity (the third uppermost curves) 
in addition to the linear ramp. The entropy again decreases, as expected, but in the case of 
unmarried teens a clear 1/2-year cycle appears. The figure gives a visual impression of how 
much more important the 1/2-year periodicity is for the unmarried teens, than it is for the 
married teens. In the third step we detrend the linear ramp plus both 1-year and 1/2- year 
periodicities (the lowest curves shown in Figure 2). We continue to see the decrease in the 
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entropy at any point in time. This decrease in entropy is very small for the married teens, 
because for them the 1/2-year cycle does not contain very much information and, finally, we 
see that the weekly cycle emerges more and more clearly. 

Figure 3 shows the linear increase in entropy in logarithmic time after the complete 
detrending of the deterministic process in Eq. (Q) for all marital and racial/ethnic groups 
from 1994 to 2000. It appears as if the special condition of the diffusion entropy, described by 
the Eq. (0), has been reached. Figure 3A shows that the residual noise is not an uncorrelated 
random process. In fact, the two slopes are 6 = 0.65 (married) and 6 = 0.60 (unmarried), 
are significantly larger than 5 = 0.5 of the uncorrelated random noise implying that the 
underlying process, after the detrending procedure, retains some anomalous persistence. So 
there is still some residual memory or information that the simple regression model described 
by the Eq. (0) is not able to extract. Moreover, we notice the similarity of the curves of 
Figures 3A and 3C. This may mean that the anomalous memory left in the data analyzed 
in Figures 2 and 3 A is due primarily to the Hispanic group. Hispanic teens also form the 
largest group in terms of number of teen births for this time period. Only in the case of 
married White and both married and unmarried African American, is the slope close to the 
value 6 = 0.5; a fact that may indicate a situation of statistical randomness of the data 
after the detrending process. This fact would mean that the regression model extracts from 
the datasets most of the important information. Instead, for both married and unmarried 
Hispanic and unmarried White teens the slope is significantly higher than that expected 
for the uncorrelated random noise, 6 > 0.60, indicating that the relative data show some 
anomalous non-periodic persistent patterns that the simple regression model represented by 
Eq. (ID) is not able to capture. 

Figure 3B shows that the main difference between married and unmarried teens is in the 
White group and indicates that births to unmarried White teens during the period 1994- 
2000 were subject to some anomalous change. Finally, Figure 3 shows differences among the 
racial/ethnic groups about the weekly cycle. The likelihood of a weekday birth is greatest for 
Hispanics and for Whites and slightly greater for unmarried than for married teens. African 
Americans, instead, do not show the weekday preference, a fact that may increase their odds 
of a neonatal death because it means that African American young teens, for a variety of 
reasons, deliver their babies on emergencies. 
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IV. CONCLUSIONS 



The non-stationarity of a time series is due to the presence of patterns that need to be 
identified for a full understanding of a phenomenon. The simplicity of regression models that 
make use of linear trends and evident periodic cycles may capture only basic patterns of a 
complex system. This fact requires the adoption of methods of analysis having the capacity of 
establishing whether all relevant information is captured by a given simple model or whether 
the model should be implemented with more complex patterns that need to be identified for 
a better understanding of a phenomenon. The multiresolution diffusion entropy analysis, 
that herein we have proposed, has the purpose to address the above problem by evaluating 
the amount of memory or stochastic information left in a time series after the systematic 
removal of the memory contribution associated to the already identified patterns. 

For a didactical purpose we have applied our method to the analysis of the number 
of births to teenagers in Texas from 1994-2000 for different groups. Table I summarizes 
all information that the simple regression model described by Eq. is able to capture. 
Nevertheless, figures 2 and 3 show that such a model is not able to extract all relevant 
information from the data. The diagram shown in figure 2 shows the effect on the global 
information left in a time series after systematic removal of the components of the non- 
stationarities associated with the regression model. This type of diagram allows one to 
visually estimate the importance of every identified pattern. Figure 3 shows that after the 
removal from the data of the memory associated with the entire regression model described 
by Eq. 1^, some datasets still retain memory patterns that manifest themselves in the 
superdiffusive character {6 > 0.5) of the trajectories generated by such detrended time 
series. 

Perhaps, the implementation of Welfare Reform in Texas in the mid 1990's was responsible 
for the long-range changes causing the superdiffusive behavior with 6 = 60 that is seen in 
the detrended time series of the unmarried White teen group, but not in the married White 
one. This type of reform was specifically developed to decrease non-marital child bearing 
and our analysis suggests that it was particular effective for the unmarried White teenagers. 
Figure 3 shows that both Hispanic and African American teenagers do not present such a 
disparity between married and unmarried teens. Therefore, perhaps the Welfare Reform in 
Texas was less effective for these racial/ethnic groups. However, while the data regarding 
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the African American group looks random after the removal of the memory associated with 
the regression model, the Hispanic group shows some anomalous residual memory {5 — 0.62 
for unmarried and 5 = 0.65 for married). Perhaps, because this behavior regards both 
married and unmarried Hispanic groups, this residual memory is a manifestation of an 
anomalous change of culture and/or, as it may appear more plausible, of the Hispanic 
teenager population due to immigration from 1994 to 2000 and of how this group interacts 
with the social state welfare policies. 

Acknowledgment : 

The authors are grateful to Prof. Patti Hamilton of Texas Woman's University in Denton 
(TX) for providing the teen birth data. N.S. gratefully acknowledges the support from ARO 
grant DAAG5598D0002. 



[1] N. Scafctta, P. Hamilton and P. Grigolini, "The Thermodynamics of Social Process: the Teen 
Birth Phenomenon," Fractals 9, 193 (2001). 

[2] N. Scafetta, P. Grigolini, P. Hamilton and B. J. West, "Non-extensive diffusion entropy analysis 
and teen birth phenomena," Chapter of the forthcoming volume: "Interdisciplinary applica- 
tions of ideas from nonextensive statistical mechanics and thermodynamics, " M. Gell-Mann 
and C. Tsallis Eds., Oxford University Press (2003). 

[3] D. A. Lam and J. A. Miron, "The effects of temperature on human fertility". Silver Spring, 
Maryland. Demography, Vol. 33, No. 3, 291-305, Aug 1996). 

[4] N. Rojansky, A. Brzesinski, J. Schenker, "Seasonality in human reproduction: an update". 
Human Reproduction 7 no 6 pp 735-745 (1992). 

[5] H. E. Hurst, R. P. Black, Y. M. Simaika, LongTerm Storage: An Experimental Study (Con- 
stable, London). 

[6] J. Feders, Fractals, Plenum Publishers, New York (1988). 

[7] C.-K. Peng, S.V. Buldyrev, S. Havlin, M. Simons, H.E. Stanley, and A.L. Goldenberger, 
"Mosaic organization of DNA nucleotides," Phys. Rev, E 49, 1685 (1994). 

[8] B.J. West, Physiology, Promiscuity and Prophecy at the Millennium : A Tale of Tails, World 
Scientific, New Jersey (1999). 



9 



[9] B.B. Mandelbrot, The Fractal Geometry of Nature, Freeman, New York, (1983). 



[10] N. Scafetta, P. Grigolini, P. Hamilton and B.J. West, |co nd-m at/0207536 , unpublished. 

[11] N. Scafetta, P. Grigolini, "Scaling detection in time series: diffusion entropy analysis," Phys. 
Rev. E 66, 036130 (2002). 

[12] P. Allegrini, V. Benci, P. Grigolini, P. Hamilton, M. Ignaccolo, G. Menconi, L. Palatella, G. 
Raffaelli, N. Scafetta, M. Virgilio, J. Jang, "Compression and Diffusion: A Joint Approach to 
Detect Complexity," Chaos, Solitons & Fractals 15 (3), 517-535 (2003). 

[13] L. E. Reichl, Statistical Physics, J. Wiley. New York (1998). 



10 



TABLE I 





All 
Mar 


All 
UnM 


White 
All Mar UnM 


Hispanic 
All Mar UnM 


Afr. A. 

All Mar UnM 


Mean 


50 


99 


44 17 27 


80 31 49 


25 2 23 


Max 


90 


157 


78 40 52 


135 55 91 


46 9 44 


Min 


21 


55 


16 3 6 


39 11 21 


9 9 


Sta Dev 


10.5 


17.5 


10.7 5.5 7.4 


14.2 6.9 10.3 


5.9 1.4 5.6 


Skew- 


0.10 


0.09 


-0.02 0.35 0.16 


0.19 0.15 0.26 


0.23 0.84 0.24 


Kurt 


-0.16 


-0.52 


-0.55 0.09 -0.30 


-0.13 -0.17 -0.12 


-0.02 0.77 8e-4 


a 


52.28 


93.25 


44.66 18.94 25.62 


72.96 31.00 41.80 


26.99 1.92 25.01 


b 


-0.62 


1.94 


-0.32 -0.65 0.35 


2.11 0.073 2.06 


-0.54 -0.04 -0.50 


Ci 


3.46 


5.42 


1.71 0.80 0.93 


5.72 2.55 3.17 


1.45 0.11 1.39 


Cl/2 


0.94 


4.48 


1.48 0.32 1.17 


2.80 0.63 2.14 


1.21 0.059 1.17 


Tl 


0.48 


0.52 


0.49 0.46 0.51 


0.50 0.49 0.50 


0.56 0.40 0.58 


n/2 


0.49 


0.48 


0.50 0.52 0.50 


0.47 0.46 0.47 


0.47 0.55 0.47 



TABLE L Basic statistical analysis to the counts of births to teens in Texas during the years from 
1994 to 2000 for the three main racial/ethnic groups - White, Hispanic and African American - 
and to the counts of births to married and unmarried teen women. For each group we tabulate the 
mean, the maximum and the minimum value, the standard deviation, the skewness, the kurtosis 
and the coefficients of the least square fitting by using Eq. (^. 
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FIG. 1: Daily births to unmarried and married teens in Texas during the years 1994 to 2000. The 
soUd curves are obtained using the two fitting equations at the bottom of the figure. 
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FIG. 2: Multiresolution Diffusion entropy analysis of the daily count of births for married and 
unmarried teens from 1994 to 2000 at different degree of detrending by using the linear model of 
Eq. The uppermost solid curves are the DEA of the original data without any detrending. 

The dash curves show the entropy of the data detrended of the linear ramp: y = a + bt. The dot 
curves are the DEA of the data detrended of the linear ramp plus the annual periodicity. Finally, 
the lowest point curves show the diffusion entropy of the data detrended of linear ramp plus both 
1 and 1/2 year periodicities. The straight lines indicate the slope of the diffusion entropy of the 
random noise that is characterized by 6 = 0.5. 
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FIG. 3: Diffusion entropy analysis after the complete detrending of the linear model represented 
by Eq. Married (solid lines) and unmarried (dash lines) teen births for the three racial/ethnic 
groups (white, Hispanic and African American) from 1994 to 2000 are analyzed. 
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